A Hybrid MPI/OpenMP 3D FFT for Plane Wave First-principles Materials Science Codes

نویسنده

  • A. Canning
چکیده

First principles electronic structure calculations based on a plane wave expansion of the wavefunctions are the most commonly used approach for electronic structure calculations in materials and nanoscience. In this approach the electronic wavefunctions are expanded in Fourier components and 3D FFTs are used to construct the charge density in real space. Efficient parallel 3D FFTs are required for many other application codes such as in fluid mechanics, climate research accelerator design, etc. Due to the large amount of communications required in 3D parallel FFTs the scaling of these application codes on large parallel machines depends critically on having a 3D FFT that scales efficiently to large processor counts. With the recent increase in the number of cores per chip/node the simple model of running one MPI process per core on large node counts results in a large number of small messages causing contention in the network and latency issues. In this paper we show that a hybrid MPI/OpenMP implementation of our 3D FFT on the Cray XT5 can significantly outperform the pure MPI version, particularly on large processor counts, by sending fewer larger messages. Our Hybrid 3D FFT has been implemented in the electronic structure code PEtot and allowed us to perform simulations of 4000 atom PbSe quantum rods on up to 21,600 cores on the Cray XT5.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance analysis of pure MPI versus MPI+OpenMP for Jacobi Iteration and a 3D FFT on the Cray XT5

Today many high performance computers are collections of shared memory compute nodes with each compute node having one or more multi-core processors. When writing parallel programs for these machines, one can use pure MPI or various hybrid approaches using MPI and OpenMP. Since OpenMP threads are lighter weight than MPI processes, one would expect that hybrid approaches will achieve better perf...

متن کامل

Parallel Fourier Transformations using shared memory nodes

The Fast Fourier Transform (FFT) is of great importance for various scientific applications used in High Performance Computing (HPC). However, a detailed performance analysis shows that the FFT routines used in these applications, prevent them from scaling to large processor counts. The All-to-All type communication required inside these transformation routines, which becomes extremely costly w...

متن کامل

A Hybrid MPI-OpenMP Implementation of an Implicit Finite-Element Code on Parallel Architectures

The hybrid MPI-OpenMP model is a natural parallel programming paradigm for emerging parallel architectures that are based on symmetric multiprocessor (SMP) clusters. This paper presents a hybrid implementation adapted for an implicit finite-element code developed for groundwater transport simulations. The original code was parallel-ized for distributed memory architectures using MPI (Message Pa...

متن کامل

Comparing Compiler and Library Performance in Material Science Applications on Edison

Materials science and chemistry applications are expected to represent approximately one third of the computational workload on NERSC’s Cray XC30 system, Edison. The performance of these applications can often depend sensitively on the compiler and compiler options used at build-time. For this reason, the NERSC user services group supplies users with optimized builds of the most commonly used m...

متن کامل

Hybrid MPI-OpenMP Parallelism in the ONETEP Linear-Scaling Electronic Structure Code: Application to the Delamination of Cellulose Nanofibrils.

We present a hybrid MPI-OpenMP implementation of Linear-Scaling Density Functional Theory within the ONETEP code. We illustrate its performance on a range of high performance computing (HPC) platforms comprising shared-memory nodes with fast interconnect. Our work has focused on applying OpenMP parallelism to the routines which dominate the computational load, attempting where possible to paral...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012